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Abstract 

In this paper, a shrinkage estimator for the population mean is 
proposed under arbitrary quadratic loss functions with unknown co- 
variance matrices. The new estimator is non-parametric in the sense 
that it does not assume a specific parametric distribution for the data 
and it does not require the prior information on the population covari- 
ance matrix. Analytical results on the improvement of the proposed 
shrinkage estimator are provided and some corresponding asymptotic 
properties are also derived. Finally, we demonstrate the practical 
improvement of the proposed method over existing methods through 
extensive simulation studies and real data analysis. 
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1 Introduction 



High-throughput molecular technologies that enable researchers to collect 
and monitor information at the genome level have revolutionized the field of 
biology in the past fifteen years. These data offer an unprecedented amount 
and diverse types of data that reveal different aspects of the biological pro- 
cesses. One such example is microarray data, where the expression levels of 
thousands of genes are measured simultaneously from each sample. These 
data have motivated the development of reliable biomarkers for disease sub- 
types classification and diagnosis, and for the identification of novel targets 
for drug treatment. Due to the cost and other experimental difficulties such 
as the availabilities of biological materials, it is common that high-throughput 
data are collected only in a limited number of samples. They are often re- 
ferred to as high-dimension, low-sample-size data, or "large p small n" data 
where p is the number of genes or dimensions and n is the sample size. 
High-dimensional data pose many challenges to traditional statistical and 
computational methods. Specifically, due to the small n, there are more un- 
certainties associated with standard estimations of parameters such as the 
mean and variance estimations. As a consequence, statistical analyses based 
on such parameter estimation are usually unreliable. 

To obtain more accurate parameter estimation some statistical methods, 
such as shrinkage, may yield better results. In the last decade, the researchers 
have proposed quite a few shrinkage-based methods to enrich the literature in 
shrinkage methodology under the "large p small n" setting, with particular in- 



terests on the variance estimation ( Tusher et all 2001 



Smvthl . I2004J : ICui et all 120051 : iTong and Wa ng. 2007; 



on th e covariance matrix estimation (ILedoit and Wolj . l2004at ISchafer and Strimmerl . 



2005 ; Pourahmadi . 2011 ; Cai and Yuan . 20121 ). Apart from the progress 



Bald i and LoneL 12001 



Ton e et all 2012b) and 



made on the variance and covariance matrix estimations, some attention 
has been paid recently to the estimation of the popu l ation mean 1 1 unde r 
the "large p small n" setting ( Hwang and Liu . 2010l ; Tong et al , 2012a ). 
An accurate estimate of /1 is desired in many areas of statistical analysis, 
e.g., in linear discr iminant analysis (lAndersonl . 120031 ). diagonal linear dis- 
criminant analysis ( Dudoit et al. , 20021 ). Markowitz mean- variance analysis 

dMarkowitzl . Il952l : IeI Karouii l2010h and so on 

Under the assumption that // is sparse, IShao et ail ( 120111 ) proposed a 
consistent estimator for \x under some regular conditions. However, in many 
real problems, there is often little prior information on /1 and it may not nec- 
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essarily have a sparse structure. In such situations, the shrinkage estimation 
of n can be appl i ed. Sh rinkage estimation starts with the amazing result of 



James and Stein! (Il96ll ) that the commonly used sample mean of a normal 



distribution is inadmissible and can be improved by shrinkage estimators. 
We refer to them as James-Stein type estimators. Sinc e then, th e re is a 



Efron and Morris! ( 


1973). 


(19861). 


Fourdrinier et al. 



fl2003h . and etc. In the literature most existing 
methods either assumed that the covariance matrix E p is known or assumed 
that there exists an estimator of S p that is invertible. As a common practice, 
if the sample covariance matrix S n is used to estimate E p , the sample size 
is required to be larger than the dimension, i.e. n > p, to avoid the singu- 
larity. Note that, however, for high- dimensional data it is common that p is 
much larger than n. Therefore, the traditional shrinkage methods can not 
be applied to analyze high-dimensional dat a directly. 

To overcome the singularity problem, iTong et all ( )2012af ) proposed a 
new shrinkage estimator for p by assuming that S p has a diagonal struc- 
ture. This assumption is equivalent to assuming that the genes are in- 
dependent of each other. Though it may not be realistic, we note that 
the diagonal assumption on S p has been made frequently in different as- 
pects ofjiiglvdimensi^^ in high- dimensional classifica- 
tion (IFan and Fan! 120081; iPang et all 12009 ) , in sh rinkage estimation of vari- 



ances ( ITong and Wangi . l2007t iHwang et al. 



20091 ) and the references therein. 



Need less to say, the di a gonal assumption is very restrictive. Recently, iRamey 
( 120121 ) and IFan et al\ (|2012|) poi n ted o ut that the d i agona l relat ed discrim- 



i nant c lassifiers in iDudoit et al\ ( 120021 ). IPang et all ( 120091 ) and ITong et al. 



( )2012af ) can be suboptimal in real data sets classification owing to the infor- 
mation loss in off- diagonal elements. In addition, the shrinkage method in 



Tone et al\ ( 12012af ) requires the data to be Gaussian distributed through a 



Bayesian model. These restrictions have largely limited the usage of existing 
shrinkage methods in high-dimensional data. It is also worth pointing out 
that the research so far has concentrated on the modelling and little is known 
about the theo retical properties of vari ous shrinkage estimators. 

Inspired by lLedoit and Wolj ( j2004bl ). in this paper we consider the shrink- 
age estimation for p under arbitrary quadratic loss functions with unknown 
non-diagonal covariance matrix. The new estimator is non-parametric in the 
sense that it does not assume a specific parametric distribution for the data 
and it does not require the prior information on covariance matrix S p . We 
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will demonstrate by both theoretical and empirical studies that the proposed 
estimator has good properties for a wide range of settings. We will also show 
that the proposed method is better than the sample mean and the existing 
shrinkage methods even under a diagonal covariance matrix assumption. The 
rest of the paper is organized as follows. Section 2 introduces the theoreti- 
cal optimal shrinkage estimation under quadratic risks. Section 3 develops 
a data-driven shrinkage estimator and derives the asymptotic properties of 
the proposed estimator. We then conduct simulation studies using simulated 
data in Section 4 and using real data in Section 5 to evaluate the proposed 
optimal shrinkage estimator and compare it with existing shrinkage methods. 
Finally, we conclude the paper in Section 6 and provide the technical results 
in the Appendix. 



2 Methodology 



Let Xi, ■ ■ ■ ,X n be independent and identically distributed (i.i.d.) 
tions satisfying the multivariate model 



6i + /■«, 



n, 



observa- 



(2.1) 



where /x is a p-dimensional vector, E p is a positive definite matrix and the ran- 
dom errors in (ejj) pxn = (ei, • • • , e n ) are i.i.d. with zero mean, unit variance 
an d finite fourth moment. Not e tha t model (12.11) is assumed the same as those 
in 



Bai and Saranadasal ( 1996 ) and Chen et al. ( 2010l ) . In this paper, we do 



not assume that the data follow a multivariate normal distribution with mean 
/i and covariance matrix E p . Given model (12.11). we consider to estimate ii 



under t he fol lowing; quadratic loss function (iBergerl . Il976l ; iBerger et a/.l . ll977 ; 



Gleserl . Il986h 



L Q {8) = n(5 - //)'Q(5 - /x)/tr(QE 



pi- 



(2.2) 



where 5 = S(Xi, • ■ ■ , X n ) is the estimator of /i, Q is a known positive definite 
matrix, and tr(A) stands for the trace of matrix A. Note that for the standard 
sample mean X — (1/n) Ylk=i Xk, the risk function is E[Lq(X)] = 1. 

In the special case w hen Xi, ■ ■ ■ , X n are multivariate normal distributed, 
James and Stein (Il96l[ ) showed that 



S JS = (i _ *LJL)X 
V nX'X ! 



(2.3) 



4 



dominates X for any p > 2 u nder the as s umpt ion that X p = Q = I p . This 
result was then extended by iBaranchikl (Il970h to S p = cr 2 I p with a 2 un- 
known, and by lEfron and Morrisl (Il973h to a Bayesian esti mator. For a 
general unknown E„, the James-Stein estimator has the form (ILin and Tsai 



19731 ; iBergerl . 1 19761 ; iBerger et all 119771 ; iGleserl . Il986l ; iFourdrinier et al 



2003) 



r(Q,S-\X) ^- 



X'S~ l X 

where S n is the sample covariance matrix which is defined as 



(2.4) 



1 n 

s n = — - J2(x k - x)(x k - xy. 



To guarantee S n is invertible, n > p is necessary which means the method is 
not applicable for "large p, small n" data. 



To overcome the singularity problem, iTong et all (12012al ) considered a 
special situation where S p is diagonal. Specifically, under the loss function 
with Q = S" 1 they constructed a hierarchical Bayesian model and then 
proposed the following shrinkage estimator, 



1 



(p-2)( ra -l) 
n(n - 3)X , D~ 1 X 



)X 



(2.5) 



where D n = diag(S' n ) is the diagonal sample covariance matri x. Other related 
works for a diagonal S„ and a diagonal Q assumptions include IBerger and Bock 
(119761 ) and IShinozakil (Il980l ). Whereas for an arbitrary Q with non-diagonal 
S p , it remains a challenging yet unanswered question under the "large p small 
n" setting. To address this question, we consider to estimate /i by a linear 
combination of X and e = (1, • • • , 1)', 

6 = aX + f3e. 



The following theorem derives the optimal shrinkage coefficients for model 
(12.11) under the quadratic loss ( 12. 2ft with an arbitrary Q. 

Theorem 2.1 Consider the optimization problem, 



min E(6 - fJt)'Q(8 - fi) s.t. 5 = aX + /3e, 

a, /3 



(2.6) 
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where the coefficients a and (5 are non-random. The optimal shrinkage esti- 
mator is given as fi* = a*X + j3*e where 



e'Qe 

a — 



2 ) 



+ ±tr(Q£ p )-^e'Qe' 
and the corresponding risk of fi* is 

E{Lq{h*)) = % " ^ eYQ ^ ~ . (2.7) 



Note that the proposed shrinkage estimator can accommodate any shift 
of the grand mean, including th e shift fr o m n t o fi + ce where c is a constant. 
This is a similar idea as that in iLindleyl ( 119621 ) where the author shrunk the 
observa tions to grand mean rather than to the origin. Also in iTong et al 
( I2012af ). the authors applied their shrinkage method to the grand mean and 
so the final estimator was a linear combination of two diffe rent componen ts. 
By Theorem 12.14 however, we point out that the method in ILindleyl (119621 ) is 
not applicable for arbitrary Q. For this point, we will expain in the simulation 
study an example where the grand mean is zero but e'Qji ^ 0. 



3 Data-driven shrinkage estimators for pop- 
ulation means 



Note that the shrinkage coefficients a* and /3* are unknown and need to 
be estimated in practice. In this section, we p r opose to estimate them 
by ^-statisti c s, mo tivated from IChen et al\ ( 120101 ) , ICai and Mai (120121 ) and 
Li and Chen! (120121 ) . Specifically, we estimate a* and /3* by 



a 



3.71 



Yin + *2,n — Y? 



and 0* 



Y 



2,71 



3,n 



Yin + Y2.n — Y-i 



-Ya 



4,n 



3,71 
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where 



1 n 1 

Y 2 , n = —{Y,x' k Qx k 

y fc=i #j 

1 " 

^ fc=i 

The resulting estimator of /i is then /t* = a*X + /3*e. To derive the asymp- 
totic properties of the proposed estimator, we need the following regularity 
condition. 

Assumption 3.1 There is a constant cq (not depending onp orn) such that 

Cq 1 < all eigenvalues of S p and Q < Co 

Under Assumption 13. 1\ we have ti(J2 p Q)/p = 0(1). In this work, o(l) 
denotes a sequence of random variables that converges to zero and 0(1) is 
short for a sequence that is bounded. Similarly, o p (l) and O p (l) a r e nota tions 
in probability. For more details, one may refer to Van der Vaart ( 2000| ). Let 

TTi = E(X - n)'Q(X -y) = -tr(QE p ), 

n 

t e'Q/i e'Qfi 
^ = ^'^Q-e e)Qi ^^Q-e e) - 

The following theorems establish the rates of convergence for the proposed 
estimators and for the loss function. 

Theorem 3.1 Under Assumption \3. 11 



nil 1 
-fi'Qfx + O p (— ), F 2 , n = -tr(£ p Q) + O p {—, 
p y/p p yfrvp 



n(fi'Qe) 2 1 e'Qw 1 

y 3,n- 7^ rUp{-), Y^ n — —— + U p [——). 

pe'Qe p e'Qe y/np 
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Further, we have 

1 - 1 e'Ou 

a*= a * + O p (—) and (3* = f3* + O p (—=) + 0/ 



Therefore, a* — a* A and ft* — (3* A as p — »■ oo and p 3 / 2 e'Qu — > ; 
where — )■ denotes convergence in probability. 



Theorem 3.2 Under Assumption \3.1\ under the "large p small n" setting 
the loss function of the shrinkage estimator fx* is 

L Q (fx*) = -^ + O p (^-). (3.8) 

7Tl + 7T 2 ^/p 

By Theorem 13. 2} we note that fx* behaves at least as well as X when p is 
large. The explic i t impr ovement of fx* over X depends on tti and 7r 2 . As in 
Ledoit and Wolf ( 2004bl ). we define the percentage relative improvement in 



average loss (PRIAL) over the sample mean as 

PRIAL = EtX-^QtX-^-ty-^W-/.), (3 . 9) 

E(X-/j)'Q(X-/i) v ' 

We then have the following corollary. 

Corollary 3.1 Let s n = ^(/j, - ^£e)'Q(ix - fr^fe). As p ^ oo we have 
(I) Ifs n ^0, PRIAL 4 1; 
(II) Ifs n ->■ C 0; PRIAL 4Ci G (0, 1); 
(III) Ifs n ->■ cx), PRIAL A 0. 

Therefore, the shrinkage estimator /t* always performs better than X under 
the loss function ( 12. 2 j) when s n is finite. In the extreme case when s n — > oo, 
/t* behaves similarly as X. 
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4 Simulation studies 



In this section, we conduct simulation studies to evaluate the performance of 
the proposed shrinkage estimator fx* and compare it with the foll owing; four 
estima tors: the sample mean X, the Jame s-Stein estimator S jp inlBaranchik 



(1l970l ). the Berger-Bock es timator 5b i n iBerger and Bockl (119761 ). and the 
Tong et.al. estimator St in lTong et al\ (j2012at ). 

Note that the existing competitors for comparison, Sjs, 8b and St, only 
work on a diagonal covariance matrix under the "large p small n" setting. 
Whereas for the proposed estimator /}*, it works for both diagonal and non- 
diagonal covariance matrices. Thus fo r a meanin g ful com parison, we will 
consider the quadratic loss function in iTong et all (12012a] ). Specifically, by 
letting Q be diagonal and let Q^ 1 = diag(S p ), we have the following loss 
function, 



L(5) = -(5- / ,)'[diag(E p )]- 1 (5- yU ), 



P 



(4.10) 



where the constant n/p is applied to guarantee that E[L(X)} = 1. In ap- 
plications, Q will be estimated from the diagonal elements of the sample 
covariance matrix. 

We simulate X\ , ■ ■ ■ , X n independently from a p-dimensional multivari- 
ate normal distribution with mean ll and covariance matrix S p . For /i, we 
consider two options: 



(a) Let Hi = 

(b) Let /i 2 = 
k > p/2. 



On, • • 
O21, • 



, Hip)' where [in, ' ' • , A*ip are i-i-d. from N(0, r 2 ); and 
, A*2p) / where // 2 fc = t for k < p/2 and /i 2 fc — ~ T f° r 



In both options, we consider r = 0.5 and 1 to represent different levels of 
mean heterogeneity. For E p , we consider three covariance matrices: 

(1) Si is diagonal with 20% of population eigenvalues being equal to 1, 
40% begin equal to 3 and 40% being equal to 10; 



(2) S 2 



S^SqS^ 2 where Sq 



(3) S 3 

for % ^ j. 



E^ 2 E qS^ 2 where S o 



(c"ij)pxp Blld (Tij 



and <Jij 



pV j\ f or 1 < z, j < p; 
= 1 for % = j, Oij = p 
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We let p range from 0.1 to 0.9 for S 2 and from 0.1 to 0.5 for E 3 to represent 
four different levels of dependent structure. 

The first simulation study is to evaluate the performance of p* with ex- 
isting methods when S p = Si, i.e., when the covariance matrix is diagonal. 
Let p = 100 throughout the simulations. We consider n = 10, 25, 50 and 
100, to represent different levels of sample sizes. Table H] reports the average 
risks of the estimators under various settings, based on 10,000 simulations. 
First of all, we observe that all shrinkage methods have a smaller risk than 
the sample mean X. This shows that for high- dimensional data, the shrink- 
age estimators do improve the standard estimation. Among the shrinkage 
estimators, 8? and fx* are among the best in most settings. The James-Stein 
estimator 5j$ is not very compatible because it is restricted to a common 
variance assumption, and 5b is only applicable for large sample sizes. Fi- 
nally, for 5t and fi*, we note that they perform similarly when p — p,\, and 
fx* is better by a large margin than St when p = p 2 . In addition, when the 
mean heterogeneity increases from r = 0.5 to r = 1, the improvement of fx* 
over X decreases which is consistent with Corollary 13 .11 We also observe that 
the improvements of the shrinkage estimators over the sample mean become 
smaller when n becomes larger. This is meaningful since for the large sample 
size scenario, the mean estimation itself is good enough and it is no longer 
necessary to borrow information from others to improve the estimation. 

The second simulation study is to evaluate the performance of fx* with 
existing methods when the covariance matrix is non-diagonal. This is to 
investigate the impact of the correlation coefficient p on the performance 
of the estimators. To achieve this, we plot in Figure [1] the average risks 
of the estimators for covariance matrices S 2 and S3 respectively, based on 
100, 000 replications. To save space, we only present the results for p = 100, 
n = 20, p = pi and r = 0.5; whereas the comparison patterns for other 
combination settings remain the similar. From the plots, it is evident that 
the proposed fx* provides a smaller average risk than the other estimators in 
most settings, no matter if p is small or not. We also note that (i) all the 
shrinkage estimators perform worse when p increases; and (ii) the risks of 5 b 
and 5t may be even larger than 1 when the dependence structure is strong, 
say for S 3 with p > 0.35. 
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H T 


n 


Sample Mean 


James- Stein 


Berger-Bock 


Tong et.al. 


Proposed 


Hi 0.5 


10 


1.0064 


0.5160 


0.6175 


0.4763 


0.4867 




25 


1.0020 


0.8092 


0.7428 


0.7195 


0.7184 




50 


0.9971 


0.8853 


0.7865 


0.7787 


0.7766 




100 


0.9954 


1.0426 


0.9191 


0.9182 


0.9179 


1 


10 


0.9996 


0.9524 


0.8710 


0.8266 


0.8323 




25 


1.0029 


0.9231 


0.8937 


0.8827 


0.8811 




50 


0.9963 


0.9868 


0.9453 


0.9430 


0.9429 




100 


1.0019 


1.0024 


0.9802 


0.9795 


0.9793 


A*2 0.5 


10 


1.0036 


0.5693 


0.6446 


0.5083 


0.4229 




25 


1.0010 


0.8119 


0.7434 


0.7165 


0.6160 




50 


0.9945 


0.9188 


0.8338 


0.8268 


0.7535 




100 


0.9940 


0.9752 


0.9062 


0.9044 


0.8605 


1 


10 


0.9983 


0.8989 


0.8578 


0.8006 


0.7300 




25 


0.9959 


0.9759 


0.9155 


0.9062 


0.8612 




50 


0.9919 


0.9927 


0.9489 


0.9470 


0.9223 




100 


0.9969 


0.9949 


0.9720 


0.9716 


0.9586 


Table 1: 


Averaj 


ie risks of the estimators under variance settings with covari- 



ance matrix Si. 
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Model 2 Model 3 




Figure 1: Plots of the average risks of the proposed method and existing 
methods when the observations are correlated. Here p = 100, n = 20 and 
H = Hi with r = 0.5. 
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5 An application 



In this section, we illustrate the p roposed shrinkage estimator using the 



Leukemia data in lGolub et al\ ( 119991 ) . The data set contains p = 7129 genes 
for 47 acute lymphoblastic leukemia (ALL) and 25 acute myeloid leukemia 
(AML), and is available online at the website http://www.broadinstitute.org/cgi- 
bin/ cancer/ datasets. cgi. 

To evaluate the performance of the proposed estimator and compare it 
with existing methods, we randomly split the 47 ALL samples into the train- 
ing set Xi and the test set X 2 . Specifically, we let the size of the training set 
range from 5 to 30 and the remaining samples assigned as the test set. Let 
Xi and X 2 be the sample means of the training and test sets, respectively. 
We further standardize the ALL and AML sets so that each array has vari- 
ance one across genes. For simplicity, we let Q = I p and let the loss function 
(14.1 Op be L(5\p) = (n/p)(8 — p)'(5 — p). Then to compare the performance 
of the shrinkage estimator 8 and the sample mean X\ based on the training 
set, we define the empirical PRIAL as 

_ L(X 2 \X X ) - L(X 2 \8) \X 2 -8\ 2 
EPR= v ' - , - 1 ; = 1 - rLJ — J— . 5.11 
L{X 2 \X X ) iXa-Xxl 2 1 ; 

Intuitively, if 8 estimates the true mean p more accurately than the sample 
mean Xi, it will serve as a better proctor of p and so L(X 2 \8) will be smaller 
than L(X 2 \Xi). As a consequence, if the estimated EPR is larger than 0, we 
may clarify that 8 is better than X\. Or equivalently, the EPR may represent 
the improvement of 8 over X±. 

With 10, 000 simulations, we plot in Figure |2] the average EPR using the 
first 100 and 200 genes of the AML and ALL sets with different sizes of 
the training set. Similarly as in Section HJ it is evident that the proposed 
estimator fx* outperforms the shrinkage estimator St in most settings. We 
also note that the improvement of fx* over St becomes smaller when the size 
of the training set increases. This shows that when the sample size is large, 
the performance of fx* over St will be very similar. Meanwhile, the decreasing 
pattern of EPR on the training size indicates that both fx* and St reduce to 
the sample mean X\ when the sample size is large. 
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ALL(p=100) 



AML(p=100) 




5 10 15 20 25 30 5 10 15 

Training set size Training set size 



ALL(p=200) AML(p=200) 




5 10 15 20 25 30 5 10 15 

Training set size Training set size 



Figure 2: The average EPRs of different shrinkage estimators on Leukemia 
data. 
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6 Conclusion 



The paper focuses on the shrinkage mean estimation under the "large p small 
n" setting. Specifically, we proposed a shrinkage estimator for the popula- 
tion mean under arbitrary quadratic loss functions with unknown covariance 
matrix. Unlike existing methods in the literature, the proposed method does 
not assume a specific parametric distribution for the data and does not re- 
quire any prior information on the covariance matrix. In this sense, the 
proposed estimator is a non-parametric shrinkage estimator and it works for 
both diagonal and non-diagonal covariance matrices. To verify the proposed 
estimator, we derived some analytical results on the estimator and on the 
optimal shrinkage coefficients. The estimators of the optimal shrinkage co- 
efficients were also derived along with some asymptotic properties. We have 
also demonstrated through simulation studies using simulated data and real 
data that the proposed shrinkage estimator performs better than the sam- 
ple mean estimation and the existing shrinkage methods under the "large p 
small n" set ting. Finally, we no te th at the p r opose d method (i) extends the 
methods in Berger et al. ( 1977 ) and Gleser ( 19861 ) from the "small p large 
n" setting to the " l arge y sma ll n" setting ; and (n) extends the methods 



m 



Berger and Bockl (119761 ) and iTong et al\ (j2012al ) from a diagonal covari- 



ance matrix assumption to a non-diagonal covariance matrix assumption. 
The proposed method has extensive applications in different areas including 
statistical genetics, epidemiology, ecology, and engineering sciences. 



Appendix: Proofs of the Theorems 

A.l. Proof of Theorem [2J] 

By direct calculation, we have 



E{5 - n)'Q{6 - //) 



a\n'Qn + -tr(QEp)) + 2a//Q(£e - fi) + (fie - fi)'Q(0e - fi) 
n 

a 2 (u'Qu + -tr(QSJ) - (2a - l)u'Qu + f3 2 e'Qe - 2/3(1 - a)e'Qu. 
n 
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This leads to the optimal weights as 



VIQH + ±tr(QE„) - V§§- 

Further, we have 

(n'Qfj, - 



(a* - StN^ - SgN + £ tr (<^ 



A.2. Proof of Theorem S3] 

Without loss of generality, we assume that E{e\ l ) = 3 + A. Then E(Yi tn ) = 
(n/p)fj.'Qti, E(Y 2>n ) = (l/p)tr(E p Q), E(Y 3>n ) = n(fi'Qe) 2 /(pe'Qe) and E(Y 4>n ) = 
e'Qu/(e'Qe). In addition, we have 

2n 1 
Var(Yi, n ) = -tr(£ p Q£ p Q) = O(-), 

Var(F 2 , n ) = 2 tr(E p QE p Q) + -^((SfQEf ) o (EfQSf)) = O(-), 

Var(n,„) = = O(J-), 

n{eQe) z np 

where Ao B = (ay&y) for matrices A = (a^) and £> = This leads to 

nil 1 
Y ltn = -n'Qfi + O p (—), Y 2 , n = -tr(£ p Q) + 0/ ^ 



pe'Qe p e'Qe wnp 



16 



For a* and $*, we have 

Vp ; p 



a — a 



(^i + ^ 2 + O p (^))(^ 1 + fvr 2 ) 

O' 1 



'PV 



^1 + ^) 



and 



(1 - a*)(F 4 ,n - ^) - (a* - 



\frvp e'Qe P y/np p ^(J tti + J tt 2 )' 



A. 3. Proof of Theorem [321 

First consider (/x* - fi)'Q(jj,* - ji). Note that 



(a*(X - /i) + /3*e + (a* - l)/i)'Q(a*(X - /i) + /3*e + (a* - 



nr 14 — ' n 



2a* 



(/3*e + (a* - l)/x)'gsy 2 ^ e fc + (f3*e + (a* - lV)'Q(/?*e + (c 



n 

fe=i 
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Then, 

Var{{^ - ri'Qtf - n)) < 2(a*) 4 [Atr(£ p Q£ p Q) + A £ 0^ 

fc=i 

8(a*) 2 (l-g*)% e'Q// e'Q/i 
n e'Qe e'Qe 

where £ P /2 Q£ P /2 = (4>ij) pxp . 

By the definitions of tti and 7r 2 , it is easy to verify that 



7Ti + 7T 2 

7Ti7r 2 



Eif - ii)'Q(v? - p) = 



7Ti + 7T 2 ' 



^ar((/i -n)Q(n -n))< t — - — ^r = (^- — ) 0(-). 

n (7Ti + 7T 2 J 4 7Ti + 7T 2 p 



Therefore, 



(// - tfQif -ii) = -^-(1 + O p (i). (6.12) 



By Theorem 3.1, we have 



A . =/ ,. + 0p( _L ) X + (0p (-^) + O p (^))e. 



Note that 



= (i_ A * )r4 _(i_ a *)_^_ 

e %/e e'Qe 
e'Ow 

+ = (i_ A *)y 4 + ( i_ a *)_^_ 

= (2 "" - a) ^ + (1 - a)(r4 -^ } - 
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We have 



(A* - y)'Q{y* - y)- {y* - y)'Q(y* - y) 
(fi* - y.*YQ(y? + fi* -2u) 

{(a* - a*)X + 0* - f3*)e)'Q{{a* + a*)X + 0* + f3*)e - 2u) 

((a* - a*)(X -fi) + (a* - d*)(^e - fi) + (1 - a*)(Y 4 - ^)e)'Q 

e'Qe e'Qe 

*((d* + a*)(X - fi) + (2 - a* - «*)(^e - /i) + (1 - a*)(F 4 - ^)e) 

(a* - a*)[(a* + a*)(X - fi)'Q(X - fi) + 2(1 - a* - a*)(X - y)'Q{^e - fi)} 

+2d*(l - d*)(Y 4 -^)(X- fi)'Qe + (F 4 - ^) 2 (1 - etfe'Qe 
e'Qe e'Qe 

+ (a* - a*)(2 - a* - - ^e)'Q(/x - ^e) 

*i(OA : ^)+O p (±))=w 1 O r (-L), (6.13) 



where we used the facts that 

1 

VP 



(X - fi)'Q(X - fi) = 7r x (l + O p (— )), 



{X-ii)'Qe = O p {Jl), 



and 

d* — a* = O 



Finally, by (I6.12p and ( 16 . 13f) we have 
This completes the proof of Theorem 13.21 



(y* - a*) W - a*) = txC^t + o p (— )). (6.14) 
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